Here are the packages we will be using:
library(statnet)
{igraph} and {statnet} are two of the most commonly used packages for social network analyses. Both include built-in functions that can plot networks and answer almost every beginner-level question about a given network. However, there may be some functions that are present in one package but not the other (or just work better). So, {intergraph} is another useful package that helps users transition between {igraph} and {statnet}. For this module, we will only need {statnet}.
When analyzing social network data, we want to know to what degree our variables impact the formation of a particular network organization. Our independent variables or predictor variables are node level variables (e.g. age) or similarity between node level variables (e.g. whether pairs belong to the same age category). Our dependent variable or response variable is the organization of edges in the network.
We cannot use standard OLS regressions to analyze pair characteristics and the presence of edges between pairs because it violates the assumption that observations are independent. For example, observations A-B, A-C, and A-D are not independent because they all involve individual A. Or if one bonobo is GG rubbing with 5 other bonobos, those 5 other bonobos may also be GG rubbing with each other. Quadratic assignment procedure (QAP) addresses this issue of non-independence.
QAP is a resampling-based method that controls for non-independence
in network structure via random permutations. These permutations use
different arrangements of the rows and columns in the adjacency matrix.
Thus, the network structure is maintained but the arrangement of
individuals in the structure is randomized. This represents the null
hypothesis because it should eliminate any potential correlations
between ties and independent values.
What kinds of questions would you ask with our bonobo data?
There are 3 types of QAP regressions we can run with {statnet}: correlation regressions, multiple/linear regressions, and logistic regressions. We will describe and demonstrate each regression using the bonobo data. We’ll also do the same with a more complex dataset from spider monkeys!
Correlation QAPs simply test whether two social networks are correlated. For example, using our bonobo datasets, we can ask: Do bonobos who groom each other correlate with bonobos who GG rub with one another?
What is our null hypothesis and alternative hypothesis in this question?
We can start by using the gcor() function to get a correlation coefficient between the two networks.
gcor(bonobo_ggr_net, bonobo_groom_net)
## [1] 0.1111054
What does this correlation coefficient signify?
These networks don’t seem very correlated but let’s test for significance anyways. The qaptest() function uses Monte Carlo simulations of likelihood quantiles to test arbitrary graph-level statistics against a QAP null hypothesis.
set.seed(123)
qap_cor <- qaptest(list(bonobo_ggr_net, bonobo_groom_net), # include both network objects in a list
gcor, # the function you're using is correlation between networks (gcor)
g1=1, # use first graph in the list (in this case the GG rubbing network)
g2=2, # use second graph in the list (in this case the groom network)
reps = 1000) # number of permutations to run (1000 is actually the default)
summary(qap_cor)
##
## QAP Test Results
##
## Estimated p-values:
## p(f(perm) >= f(d)): 0.328
## p(f(perm) <= f(d)): 0.81
##
## Test Diagnostics:
## Test Value (f(d)): 0.1111054
## Replications: 1000
## Distribution Summary:
## Min: -0.4191705
## 1stQ: -0.2070601
## Med: 0.005050247
## Mean: -0.00163123
## 3rdQ: 0.1111054
## Max: 0.8534918
Under “Test Diagnostics”, the test value represents the correlation coefficient between grooming and GG rubbing networks. This is interpreted at the level of the dyad, pairs who groom each other are likely not correlated to pairs who GG rub with one another. We previously calculated this using the gcor() function. A positive value indicates that networks positively correlate. In other words, one network has ties where the other also does. A negative value indicates that networks negatively correlate. In other words, one network has ties whether the other does not.
Under “Estimated p-values”, we find statistics that tell us about how this observed correlation compares to what the correlation looks like when we randomly permute one of our networks. Note that these are both one-tailed p-values! One tests whether the correlation is higher than expected by random chance (p(f(perm) >= f(d))), and the other tests whether the correlation is lower than expected by random chance (p(f(perm) <= f(d)). We can see that our p-values are greater than 0.05, meaning we have evidence to fail to reject our null hypothesis. There is no significant correlation between grooming and GG rubbing networks.
We can also visualize the “Distribution Summary” using the plot() function.
plot(qap_cor)
The dotted line gives us the correlation among the observed networks. The curve shows the distribution of correlations across the permuted networks. The permutations are random, so they are expected to form a normal distribution in which most of the permutations have a correlation of 0. The correlation of our observed networks falls within the typical amount of correlation expected from random networks.
Linear regression QAPs test the extent to which predictor variables are affecting edge weights. Recall that our predictor variables are node attributes.
For example, if our predictor variable is rank, we can ask: Does rank influence the edge weight of GG rubbing? In other words, we are asking: Is a bonobo more likely to have more weighted edges for each unit increase in rank?
Before we dive in, what do you predict? Do you think there is be a difference between actors (senders) and receivers? What does it mean to have more edge weight?
Predictor variables must be matrices in order to be used in the QAP functions. We can look at whether rank affects the weight of ties sent or received (or both, but they must be added to the model separately).
First, we must create a matrix that shows the rank of senders. Note that senders are represented by ROWS in the matrices.
nodes <- length(bonobo_attribute$actor) # number of nodes in the data set
rank <- bonobo_ggr_net %v% "rank" # a vector of the node-level variable we are interested in
rank_sending <- matrix(data = NA, nrow = nodes, ncol = nodes) # create empty matrix to be filled for senders
for (i in 1:nodes){ # for 1 through the number of nodes in the data set
rank_sending[i,] <- rep(rank[i], nodes)} # the rank of each actor is repeated over entire ROW of matrix
rank_sending
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 3.42 3.42 3.42 3.42 3.42 3.42 3.42
## [2,] 3.42 3.42 3.42 3.42 3.42 3.42 3.42
## [3,] 6.20 6.20 6.20 6.20 6.20 6.20 6.20
## [4,] 4.66 4.66 4.66 4.66 4.66 4.66 4.66
## [5,] 4.11 4.11 4.11 4.11 4.11 4.11 4.11
## [6,] 5.57 5.57 5.57 5.57 5.57 5.57 5.57
## [7,] 5.23 5.23 5.23 5.23 5.23 5.23 5.23
Next, we must create a matrix that shows the rank of receivers. Note that receivers are represented by COLUMNS in the matrices.
rank_receiving <- matrix(data = NA, nrow = nodes, ncol = nodes) # empty matrix for receivers
for (i in 1:nodes){ # for 1 through the number of nodes in the data set
rank_receiving[,i] <- rep(rank[i], nodes)} # the rank of each receiver is repeated over entire COLUMN of matrix
rank_receiving
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [2,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [3,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [4,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [5,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [6,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [7,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
Now we can run our linear regression! First we are asking: Does rank affect how many times bonobos GG rubbed with each others (the weight of the edges they sent)? Our null hypothesis is that rank does not affect how many times bonobos GG rub others (both high and low).
set.seed(123)
lrqap <- netlm(bonobo_ggr_net, # response variable is a network object with a weighted adjacency matrix
list(rank_sending)) # list of predictor variables as network or matrix objects
summary(lrqap)
##
## OLS Network Model
##
## Residuals:
## 0% 25% 50% 75% 100%
## -0.4534205 -0.3093578 -0.2036344 0.5465795 0.8695587
##
## Coefficients:
## Estimate Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept) 0.8507548 0.995 0.005 0.008
## x1 -0.1161796 0.011 0.991 0.021
##
## Residual standard error: 0.4587 on 40 degrees of freedom
## Multiple R-squared: 0.06227 Adjusted R-squared: 0.03883
## F-statistic: 2.656 on 1 and 40 degrees of freedom, p-value: 0.111
##
##
## Test Diagnostics:
##
## Null Hypothesis: qap
## Replications: 1000
## Coefficient Distribution Summary:
##
## (intercept) x1
## Min -2.660544 -1.882904
## 1stQ -0.643181 -0.590200
## Median 0.098940 -0.007029
## Mean 0.103537 -0.027443
## 3rdQ 0.832046 0.496377
## Max 2.693891 1.874392
To interpret this we can imagine the x axis being rank and the y axis being edge weight. For every unit increase in rank, edge weights are going down 0.1161796 (the estimate of x1 in the above coefficients). This relationship is slightly significant at 0.021 on the two tailed test. However, if we look at our adjusted r^2 value, our model is only explaining ~4% of the variability. And our F-statistic p value is too high to be significant at 0.111. Taken together, we can say that the rank of sender is not significantly explaining the edge weight of GG rubbing interactions.
Now let’s look at the receivers.
Does rank affect how many times bonobos received GG rubbing from others (the weight of the edges they receive)? The null hypothesis is the same as above, that rank of receiver does not affect the number of times they are GG rubbed with.
set.seed(123)
lrqap2 <- netlm(bonobo_ggr_net, # response variable is a network object with a weighted adjacency matrix
list(rank_receiving)) # list of predictor variables as network or matrix objects
summary(lrqap2)
##
## OLS Network Model
##
## Residuals:
## 0% 25% 50% 75% 100%
## -0.3992884 -0.3393159 -0.2434687 0.6007116 0.8021901
##
## Coefficients:
## Estimate Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept) 0.64715036 0.997 0.003 0.006
## x1 -0.07247427 0.038 0.964 0.083
##
## Residual standard error: 0.4679 on 40 degrees of freedom
## Multiple R-squared: 0.02423 Adjusted R-squared: -0.0001605
## F-statistic: 0.9934 on 1 and 40 degrees of freedom, p-value: 0.3249
##
##
## Test Diagnostics:
##
## Null Hypothesis: qap
## Replications: 1000
## Coefficient Distribution Summary:
##
## (intercept) x1
## Min -2.1166790 -1.2337177
## 1stQ -0.5527539 -0.4807737
## Median 0.0543513 -0.0004686
## Mean 0.0602322 -0.0068813
## 3rdQ 0.6924992 0.4798283
## Max 2.0115654 1.3478074
Just like with the actors/senders, let’s imagine that rank is on the x axis and edge weight is on the y axis. Again, we see a negative slope, though this one is even less steep at -0.07247427. The two tailed test p value for this variable is not significant. This model is explaining almost 0% of the variability, which means this model is really not a good fit for the residuals. We can imagine the points scattered far away from the line. The F-statistic p-value is too high to be significant at 0.3249, which is even higher than the sender p-value. Rank of receiver does not significantly explain the edge weight of GG rubbing interactions.
So we fail to reject the null hypothesis in this question too. But that begs the question, if rank is not influencing the edge weight, then is there some other attribute that is?
What if we included multiple predictor variables? We can add another network as a predictor variable! By doing so, we are conducting a multiple regression QAP. For example, we can now ask: Do sender rank and grooming relationships predict the number of times bonobos GG rubbed in the network? The null hypothesis is that neither rank of the actor nor grooming relationships affect the number of times bonobos initiate GG rubbing (two tailed test).
set.seed(123)
mrqap <- netlm(bonobo_ggr_net, # response variable is a network object with a weighted adjacency matrix
list(rank_sending, rank_receiving, bonobo_groom_net)) # list of all predictor variables as network or matrix objects
summary(mrqap)
##
## OLS Network Model
##
## Residuals:
## 0% 25% 50% 75% 100%
## -0.7038861 -0.3321997 -0.1704610 0.5354274 0.7466385
##
## Coefficients:
## Estimate Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept) 1.3835945 0.993 0.007 0.011
## x1 -0.1392945 0.007 0.993 0.018
## x2 -0.1036147 0.020 0.980 0.051
## x3 0.1510413 0.798 0.202 0.413
##
## Residual standard error: 0.4541 on 38 degrees of freedom
## Multiple R-squared: 0.1269 Adjusted R-squared: 0.05801
## F-statistic: 1.842 on 3 and 38 degrees of freedom, p-value: 0.1561
##
##
## Test Diagnostics:
##
## Null Hypothesis: qap
## Replications: 1000
## Coefficient Distribution Summary:
##
## (intercept) x1 x2 x3
## Min -3.12987 -2.37270 -1.70518 -2.94942
## 1stQ -0.83095 -0.64332 -0.47930 -0.92280
## Median 0.02042 -0.03306 0.06527 -0.20974
## Mean 0.04138 -0.03371 0.03601 0.12131
## 3rdQ 0.80742 0.56561 0.57086 0.81042
## Max 2.82364 2.30921 2.00442 10.68268
Are any of the above relationships significant when taken into the model together? Is this model a better fit than the others? How can we tell?
Logistic regression QAPs test the extent to which independent variables are affecting the presence or absence of edges, but not the weight of those edges. If you have a network with binary ties, we can use this regression method to predict ties in the outcome network (response variable). Using our bonobo dataset, we can ask: Does higher rank make an individual more or less likely to G-G rub with other individuals?
In this example, our GG rubbing network is the response variable while age is our predictor variable. If you imagine rank (as the predictor variable) on the x axis, for every unit of rank increase for a bonobo, are they more likely to GG rub with more individuals? Will an older individual have more ties on the social network?
If we were using a different predictor variable, then we would need to create new matrices for senders and receivers. However, we are using the same one as before (rank) so we can skip that step this time. Now we can run our logistic regression!
What is our null hypothesis for this question?
The netlogit() function performs a logistic regression of the network variable for the response variable (bonobo_ggr_net) on the network variables in the predictor variables (age_receiving and age_sending). The resulting fits and coefficients are tested against the null hypothesis.
set.seed(123)
logqap <- netlogit(bonobo_ggr_net, # response variable is a network object
list(rank_receiving, rank_sending), # list of all predictor variables as network or matrix objects
reps = 1000) # number of draws for quantile estimation, 1000 reps is the default
summary(logqap)
##
## Network Logit Model
##
## Coefficients:
## Estimate Exp(b) Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept) 4.5524763 94.8670398 0.974 0.026 0.042
## x1 -0.4903676 0.6124013 0.020 0.980 0.044
## x2 -0.6817886 0.5057117 0.015 0.985 0.023
##
## Goodness of Fit Statistics:
##
## Null deviance: 58.22436 on 42 degrees of freedom
## Residual deviance: 47.45396 on 39 degrees of freedom
## Chi-Squared test of fit improvement:
## 10.77041 on 3 degrees of freedom, p-value 0.01303442
## AIC: 53.45396 BIC: 58.66696
## Pseudo-R^2 Measures:
## (Dn-Dr)/(Dn-Dr+dfn): 0.2040994
## (Dn-Dr)/Dn: 0.1849811
## Contingency Table (predicted (rows) x actual (cols)):
##
## Actual
## Predicted 0 1
## 0 25 11
## 1 4 2
##
## Total Fraction Correct: 0.6428571
## Fraction Predicted 1s Correct: 0.3333333
## Fraction Predicted 0s Correct: 0.6944444
## False Negative Rate: 0.8461538
## False Positive Rate: 0.137931
##
## Test Diagnostics:
##
## Null Hypothesis: qap
## Replications: 1000
## Distribution Summary:
##
## (intercept) x1 x2
## Min -2.099992 -1.536187 -2.015489
## 1stQ -0.697576 -0.590863 -0.614552
## Median -0.026785 -0.008864 0.055893
## Mean -0.002342 -0.011792 0.030776
## 3rdQ 0.704731 0.553704 0.642645
## Max 1.996422 1.661594 2.026379
As a reminder, we are testing whether rank affects the likelihood of giving and receiving GG rubbing. In the output above, x1 represents the first variable in the model (rank_receiving), and x2 represents the second variable in the model (rank_sending).
Because they hypothesis we were testing was whether there is any relationship between rank and sending and receiving ties, we will use the two-tailed p-valued in the column labeled Pr(>=|b|). In this case, both are significant if our alpha value is 0.05; rank_receiving squeaks in right under it! They both have negative slopes as well.
The first column, “Estimate” is the log odds which is difficult to interpret. Think of it as a tendency to form or not form ties. The log odds for rank_receiving is -0.4903676, so there is an increasing tendency with rank to not receive ties. The second column, Exp(b), gives odds ratios instead, which are easier to understand. The odds ratio gives the likelihood of a tie to form (or not form) under one condition in contrast to another condition. The odds ratio of rank_receiving is 0.6124013. So for each unit of rank increase a female bonobo is, she is 0.6124013 times as likely to receive any given tie compared to if she were lower ranked. An odds ratio less than 1 is associated with a lower odds of outcome (receiving GG rubbing in this case). More generally, they are less likely to receive GG rubbing as they rise in the ranks. The same can be said for sending ties as they get higher ranked. This slope is even steeper at -0.6817886 and an even lower odds ratio of 0.5057117. With each unit of rank, the less likely a bonobo is to initiate gg rubbing. The Chi-squared goodness of fit test p-value is significant.
A follow up question given these results could be: are higher ranked bonobos less likely to participate in GG rubbing as often as lower ranked bonobos? We could answer this with a linear regression QAP because we would be looking at the weight of edges in this case. The multiple regression we did before, which also took grooming into account, looked at this and may help address this question.
You may have noticed when we visualized the bonobo data that the individuals sorted into two groups that did not have any observed interactions. The original dataset actually used bonobos living in two separate groups. While bonobos sometimes interact with conspecifics in other groups, no such interactions were observed in this data. Therefore, belonging to the same group appears to be an important predictor for whether a dyad will have an interaction. This may be obscuring other trends when we do not account for group belonging in our models.
We can account for the bonobos’ group belonging including a variable that accounts for homophily, the tendency of actors to form edges with others who are similar to them in some way. To do this we need to create a group homophily variable as a binary adjacency matrix. In this matrix, “1” or “TRUE” will indicate that a sender and receiver share the same group value. “0” or “FALSE” will indicate that they do not share the same group value.
The function outer() will take two vectors (or two of the same vector in this case) and create a new matrix based on a specified function applied to every combination of terms in the vector. The function “==” tells outer() to create a matrix that reports TRUE/FALSE for every combination of terms.
group <- bonobo_ggr_net %v% "group" # first create a vector giving the group of each node
group_homophily <- outer(group, group, FUN = "==") # shows us whether each combination of terms in vector group is the same (TRUE) or different (FALSE)
group_homophily
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] TRUE FALSE FALSE TRUE TRUE TRUE FALSE
## [2,] FALSE TRUE TRUE FALSE FALSE FALSE TRUE
## [3,] FALSE TRUE TRUE FALSE FALSE FALSE TRUE
## [4,] TRUE FALSE FALSE TRUE TRUE TRUE FALSE
## [5,] TRUE FALSE FALSE TRUE TRUE TRUE FALSE
## [6,] TRUE FALSE FALSE TRUE TRUE TRUE FALSE
## [7,] FALSE TRUE TRUE FALSE FALSE FALSE TRUE
For example, bonobos 1 and 2 belong to different groups, so cell [1,2] contains FALSE. We can now use this matrix as a predictor variable of group homophily in our logistic model from earlier.
set.seed(123)
logqap2 <- netlogit(bonobo_ggr_net,
list(rank_receiving, rank_sending, group_homophily),
reps = 1000)
summary(logqap2)
##
## Network Logit Model
##
## Coefficients:
## Estimate Exp(b) Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept) 4.2407432 69.4594517 0.966 0.034 0.065
## x1 -0.5134417 0.5984324 0.036 0.964 0.064
## x2 -0.7186133 0.4874277 0.009 0.991 0.017
## x3 1.1806238 3.2564048 0.868 0.132 0.218
##
## Goodness of Fit Statistics:
##
## Null deviance: 58.22436 on 42 degrees of freedom
## Residual deviance: 44.77452 on 38 degrees of freedom
## Chi-Squared test of fit improvement:
## 13.44985 on 4 degrees of freedom, p-value 0.00927464
## AIC: 52.77452 BIC: 59.72519
## Pseudo-R^2 Measures:
## (Dn-Dr)/(Dn-Dr+dfn): 0.2425588
## (Dn-Dr)/Dn: 0.2310003
## Contingency Table (predicted (rows) x actual (cols)):
##
## Actual
## Predicted 0 1
## 0 25 8
## 1 4 5
##
## Total Fraction Correct: 0.7142857
## Fraction Predicted 1s Correct: 0.5555556
## Fraction Predicted 0s Correct: 0.7575758
## False Negative Rate: 0.6153846
## False Positive Rate: 0.137931
##
## Test Diagnostics:
##
## Null Hypothesis: qap
## Replications: 1000
## Distribution Summary:
##
## (intercept) x1 x2 x3
## Min -1.891645 -1.657796 -2.010851 -2.303175
## 1stQ -0.654240 -0.541867 -0.606004 -0.987334
## Median -0.021953 -0.003103 0.044755 -0.384264
## Mean 0.007076 -0.012917 0.030103 -0.111246
## 3rdQ 0.681959 0.511718 0.666226 0.741716
## Max 2.013238 1.635185 1.981424 3.058840
At a glance, including group homophily does not improve this model. There is not a significant relationship between group homophily and the presence/absence of a given tie. The AIC value is only slightly lower for the model including group homphily than the one without group homophily.
Our bonobo data was a little anti-climactic. But sometimes, that’s the nature of small sample primatology work. We are at the hour mark. If you would like to continue on, we have another data set. To make things a little more exciting, let’s run the three QAPs again with some spider monkey data, but this time you can try it more on your own.
Anzà, S., Demuru, E., & Palagi, E. (2021). Sex and grooming as exchange commodities in female bonobos’ daily biological market. Scientific reports, 11(1), 19344.
David, H. A. (1987). Ranking from unbalanced paired-comparison data. Biometrika, 74(2), 432–436. https://doi.org/10.1093/biomet/74.2.432
De Vries, H., Stevens, J. M. G., & Vervaecke, H. (2006). Measuring and testing the steepness of dominance hierarchies. Animal Behaviour, 71(3), 585–592. https://doi.org/10.1016/j.anbehav.2005.05.015
Luke, Douglas A. 2015. A User’s Guide to Network Analysis in R. Springer.
Rimbach, R., Bisanzio, D., Galvis, N., Link, A., Di Fiore, A., & Gillespie, T. R. (2015). Brown spider monkeys (Ateles hybridus): A model for differentiating the role of social networks and physical contact on parasite transmission dynamics. Philosophical Transactions of the Royal Society B: Biological Sciences, 370(1669), 20140110. https://doi.org/10.1098/rstb.2014.0110
Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications. Cambridge University Press.